42%
11.09.2023
range and a capacity pushing 20TB or more. Many (most?) MPI applications still do I/O with the rank 0 process, and this amount of local storage with fantastic performance just begs for users to run
51%
20.03.2023
is important because it includes where things like MPI libraries or profilers are located, as well as where compilers and their associated tools are located. I discuss these concerns as the article progresses
41%
17.01.2023
is more important than some people realize. For example, I have seen Message Passing Interface (MPI) applications that have failed because the clocks on two of the nodes were far out of sync.
Next, you
40%
13.12.2022
the Compute Hostname
$ ping n0001
PING n0001-default (10.0.1.1) 56(84) bytes of data.
64 bytes from n0001-default (10.0.1.1): icmp_seq=1 ttl=64 time=2.20 ms
64 bytes from n0001-default (10.0.1.1): icmp_seq=2
44%
13.06.2022
be turned on and off according to what you want to check about the state of the node.
Almost 20 years ago, when I worked for a Linux high-performance computing (HPC) company that no longer exists, We had
43%
24.02.2022
-o /lustre/test.01
IOR-3.4.0+dev: MPI Coordinated Test of Parallel I/O
Began : Tue Jan 25 20:02:21 2022
Command line : /usr/local/bin/ior -F -w -t 64m -k --posix.odirect -D 60 -u -b 5g
43%
09.12.2021
Interface (MPI) standard, so it’s parallel across distributed nodes. I will specifically call out this tool.
The general approach for any of the multithreaded utilities is to break the file into chunks, each
44%
14.09.2021
ACC, and MPI code. I carefully watch the load on each core with GKrellM,and I can see the scheduler move processes from one core to another. Even when I leave one or two cores free for system processes
46%
18.08.2021
part, darshan-util
, postprocesses the data.
Darshan gathers its data either by compile-time wrappers or dynamic library preloading. For message passing interface (MPI) applications, you can use
40%
21.01.2021
1992
i486DX2
2:1 clock multiplier, 40/20, 50/25, 66/33 speeds; L2 on MB
Mar 1994
i486DX4
3:1 clock multiplier, 75/25, 100/33 speeds; 16KB L1 cache on-die, L2